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A linear algebraic method named the shifted conjugate-orthogonal-conjugate-gradient method is 
introduced for large-scale electronic structure calculation. The method gives an iterative solver 
algorithm of the Green's function and the density matrix without calculating eigenstates. The 
problem is reduced to independent linear equations at many energy points and the calculation is 
actually carried out only for a single energy point. The method is robust against the round-off 
error and the calculation can reach the machine accuracy. With the observation of residual vectors, 
the accuracy can be controlled, microscopically, independently for each element of the Green's 
function, and dynamically, at each step in dynamical simulations. The method is applied to both 
semiconductor and metal. 

PACS numbers: 71.15.-m, 71.15.Nc, 71.15.Pd 



I. INTRODUCTION 

Large-scale atomistic simulation with quantum me- 
chanical freedom of electrons requires manipulation of a 
large Hamiltonian matrix. In order to calculate physical 
quantities of a system, we should obtain either eigen- 
states or the density matrix of the system. The calcu- 
lation of eigenstates is usually reduced to matrix diago- 
nalization procedure and this procedure results in severe 
computational cost for a large-scale system. 

Any physical quantity X can be evaluated by means 
of the density matrix p as 



(X) = J J drdr'p(r,r')X(r',r). 



(1) 



Even though the density matrix is of long-range, only 
the short-range behavior of the density matrix is nec- 
essary, if X is a short-range operator. The energy 
and forces acting on an individual atom are really this 
case, and the locality of the Hamiltonian realizes this 
feature in large-scale calculation. Moreover, the den- 
sity matrix p can be obtained from the Green's func- 
tion. Therefore, the essential methodology for large- 
scale electronic structure calculation and molecular dy- 
namics (MD) simulation is how to obtain density ma- 
trix/; or Green's function without calculating eigenstates. 
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We have developed a set of methods for large-scale 
atomistic simulation without calculating eigenstates in 
fully quantum mechanical description of electron sys- 
tems. 

SHIl&Hmill Among them, the subspace di- 
agonalization method based on Krylov subspace (SD-KS 
method) was introduced, where the original Hamiltonian 
matrix H is reduced to a small size easy-tractable one and 
its diagonalization leads to approximation of the density 
matrix of the original system. [Tlj The first important 
feature of the SD-KS method is that we can monitor nu- 
merical accuracy during the simulation using a residual 
error of the Green's function, as shown in the present 
paper. The second important feature is that the SD-KS 
method can be used both for metallic and insulating sys- 
tems. We found, however, that the SD-KS method has 
a numerical instability, when the energy spectrum is cal- 
culated with a very fine energy resolution, as discussed 
in Appendix lAl 

Then our new strategy to obtain the Green's function 
and the density matrix is to solve linear equations with 
a given basis 



(z-H)\ Xj } = \j). 



(2) 



In the present case, the Hamiltonian H is real symmetric 
and (z — H) is not Hermitian but complex symmetric 
with a complex energy [z = E + £7). Once the linear 
equation is solved, one can obtain any element of the 
Green's function as 



G ij (z) = (i\(z-H)- 1 \j) = (i\x j ). 



(3) 



The numerical energy integration is required to obtain 
the one-body density matrix; 



Pij 



-- lmG l3 (E + i 7 ) / (^-^) dE, 



(4) 
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with the Fermi distribution function f(x) and a small 
imaginary part of energy (7 — ► 0+). The chemical po- 
tential n is determined so that the sum of the diagonal 
elements of the density matrix equals the total number 
of electrons. 

The aim of the present paper is to introduce 
the shifted conjugate-orthogonal conjugate-gradient 
(COCG) method, an iterative solver algorithm of cq. J2J- 
The Green's function and the density matrix arc ob- 
tained using eqs. © and (@J. The shifted COCG method 
shares the before-mentioned two features with the SD- 
KS method. Moreover, the third important feature of 
the shifted COCG method, different from the SD-KS 
method, is the robustness against round-off error and the 
calculation can reach the machine accuracy. 

The present paper is organized as follows; In Sec. [HI 
Krylov subspace will be explained and the shifted COCG 
method will be introduced. In Sec. lIIII the residual norm 
(RN) will be introduced to monitor the convergence be- 
havior of the method. The number of operations in ac- 
tual calculation is also discussed. Section HVI is devoted 
to application of the present method to an atomic scale 
reconstruction of semiconductor surface (silicon) and the 
bulk electronic structure of metal (copper). The con- 
clusion will be given in Sec. El In Appendix EI several 
numerical aspects will be discussed for the shifted COCG 
and the SD-KS methods and the difference between the 
two method will be clarified. 



II. SHIFTED CONJUGATE-ORTHOGONAL 
CONJUGATE-GRADIENT METHOD 

A. Shifted systems and Krylov subspace 

Now we should concentrate the method to solve eq. @ 
with a large matrix H and a fixed basis \ j). The method 
should be iterative and not require the matrix inver- 
sion procedure of (z — H). The problem is reduced 
to the linear equations, eq. ©, for a given set of en- 
ergy points z = zi,Z2,Z3 These linear equations are 

called 'shifted' linear equations or 'shifted' linear sys- 
tems in mathematical textbooks, because shifted matrices 
(zi — H). (z 2 — H), (Z3 — H),.. appear. If the equations 
are solved independently among different energy points, 
the total computational cost is proportional to the num- 
ber of energy points N ene . The essence of the present 
method is that we should solve the equation only at one 
energy point and the solutions at other energy points are 
given with a moderate computational cost. 

The present method is realized using Krylov subspace 
(KS). DSG3 Krylov subspace is defined for an arbitrary 
matrix A and vector as the linear space spanned by 
a set of states (vectors) {A n |j)}; 

K n ( A, 1 3) ) ee span { | j), A\j), A 2 \j), , 

(5) 

where n is the dimension of the KS. Iterative methods 
based on KS, such as the standard conjugate-gradient 



algorithm, are generally called Krylov subspace methods. 
In the present method, the solution vector \xj) of eq. J2J, 
is constructed within the KS of K n (z — H, |j)) at the n-th 
iteration. 

The present method, the shifted COCG method, 
is a combined method of two KS algorithms; (a) 
the conjugate-orthogonal conjugate-gradient method 
(COCG method) [l8j and (b) the theorem of collinear 
residual for shifted linear systems [l9|. The essential 
point is that the KS among shifted systems gives the 
same linear space 

K n ( Zl -H,\j)) = K n (z 2 -H,\j)). (6) 

The actual procedures are given in the next subsection. 



B. Shifted COCG method 

Here we present the formulation of the shifted COCG 
algorithm, following Ref. [l^; We pick out arbitrarily 
one energy point as 'reference' energy point z IC { = E re [ + 
ij. Equation @ at the reference energy (z = z ro f) is 
reformulated as 



Ax = b, 



(7) 



where the matrix A is defined as A = z ro f — H and the 
suffix j is dropped (jj) => b, \xj) =4> x). Since the matrix 
A is not Hermitian, the matrix-vector notation is used in 
this subsection, rather than the bracket notation. Here- 
after the equation at z = z ro f is called 'reference' system. 

For cq. (Q, we use the COCG algorithm 0, a stan- 
dard iterative algorithm for a linear equation with a com- 
plex symmetric matrix A. |20j At the n-th iteration, the 
solution vector x n , the residual vector r„, and the search 
direction vector p n are represented as 

x„ = x n -i + a„_ip„_i, (8) 
r„ = r„_i - a„_iAp„_i, (9) 
Pn = r n + /3„_ip„_i, (10) 

respectively. Here, coefficients a n and /?„ are given as 



Pn = 



PnApn ' 
T 

r n+l r n+l 



(11) 

(12) 



The initial conditions for iteration are Xq = p 1 = 0, 
ro = 6, /3_i = 0, ct-i = 1. Note here that the inner 
products are given as a T b(^= a H b). When the iteration 
number n reaches the matrix dimension of A, denoted 
M, the residual vector should be zero and the solution 
vector should be exact {tm = 0,Xm = A _1 6). 

Eliminating p n -x from eq. © with eq. (fTU|> . we obtain 
three-term recurrence relation for the residual vector; 



r n +i 



-a n Ar n 



1 



Pn-ldn 



Cin-1 



-r n -x- 



(13) 
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The most time-consuming part of the COCG algorithm 
is the matrix- vector product (Ap n ) in eq. ifTTJl. This 
matrix-vector product corresponds to the procedure for 
updating the KS; K n (A, b) => K n+1 (A, b). 

Similarly, we reformulate eq.J2J) with a shifted energy 
point z = z ro f + a as 



(A + al)x = b. 



(14) 



For the 'shifted' system, the n-th solution vector x° and 
the search direction vector p° are given as 



P", 



x n-l ' a n-lPn 
r n + Pn-lPn-l- 



(15) 
(16) 



The initial values of the vectors or coefficients are chosen 
to be the same as in the reference system. The equation 
corresponding to ea. 1(13(1 of the shifted system is 



al{A + aI)r° n 

^ , rn- 1 n 



(17) 



Since the KS between the reference and shifted systems 
are equivalent (K n (A, b) = K n (A + a I, b)), as stated in 
cq. JBJ, one can prove that the residual vectors between 
them, and r n , are collinear; 



1 



(18) 



which is the theorem of collinear residual for shifted linear 
systems. With eq. I|18f) . cq. 1(13(1 can be modified as 



needed only for the reference system, which reduces the 
computational cost drastically. The detail of the compu- 
tational cost will be estimated in Sec. IIII Bl 

We note here that the shift parameter o~(= z — z re {) is 
an arbitrary complex variable in principle but we choose 
the value to be real (a — E — E TC f) in all the practical cal- 
culations of the present paper. We also note that a prac- 
tical electronic-structure calculation can be parallelized 
with the present method, because the original problem 
of eq. 101 is independent with respect to the basis suffix 
3- 



III. CONVERGENCE BEHAVIOR AND 
COMPUTATIONAL COST 

A. Convergence behavior with residual norm 

Since the shifted COCG method is an iterative solver 
algorithm for eq. (0, we should establish a systematic 
procedure to find an optimal iteration number in the con- 
text of electronic structure calculation. In this section, 
such a systematic procedure is introduced by monitoring 
the norm of residual vector. 

At the 7i-th iteration, we denote the solution vectors 
for the reference and shifted systems as \xn ) and \xn^}, 
respectively. The corresponding residual vectors are writ- 
ten by 



|r«) = \j)-(z, cl -H)\x^) 



0)\ 



(23) 



K«) = \ j )-( Zlsl + a-H)K^) = ^-\r^) > (24) 



n+l 



"n+l 

K 



-a n {A + aI)r° n 



1 + CtnCT 



Pn—1 ^r, 

a n -i 



\ 

K+l On-l 



(19) 



Comparing the coefficients in eqs. (|T§j) and (|T?|l . we ob- 
tain 



n+l 



'n+l 



1 + a„ o~ 



Pn—1 ^n 
Otn-1 



/? w -lQ n 
Otn-1 



(20) 
(21) 

(22) 



with the initial values of 7Tq = ■n a _ l = 1. We can up- 
date the vector r£ and the coefficients and us- 
ing eqs. UHJ), (1201) ^ ISH> , an d which do not include 
any matrix- vector product. Consequently, the time- 
consuming procedure of the matrix-vector product is 



respectively. The last equality is given by eq. 1)18(1 . 

Since we need only the elements of the density matrix 
among near-sited orbital pairs or of the short-distance 
components in cq. , the convergence is necessary only 
for these components, but not for far-distance compo- 
nents. Therefore, we define a residual norm (RN) from 
the components only among these near-sited orbitals (|i)) 
that are determined by the interaction range of Hamilto- 
nian; 



.Cj)||2 



Kj')||2 = 



E \® r P)\ 



E K*K 



(25) 



II^H 2 , (26) 



where Mj nt is the number of interacting orbitals |i) for 
a basis \j), typically 10 to 10 2 . Note that the RN is an 
energy-dependent quantity. 

Since the Green's function should be integrated over 
the given set of energy points to obtain the density 
matrix, we need to know the convergence behavior of 
the RN over the entire energy range. Then we average 
the RN over the energy range (E m i n < E < E max or 



4 



E n 



E re r < a < E„ 



E r , 



■f • 



We call the resultant 



quantity the energy-averaged residual norm (a-RN) ; 
^ = - dE\\rl {j) \ 



R 



E 



mm 
2 



where 



Cj) 



? U) 





1 


'E min 





(27) 



(28) 



the a-RN, FCft' , can be monitored at every iteration 
(n) , the optimal number of iterations can be determined 

by the value of R% ' . The above determination is carried 
out for the microscopic freedoms or individually among 
the basis suffix j. 
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FIG. 1: Decay behavior of the a-RN for Si crystal with 512 
atoms. A universal curve appears with four different reference 
energy points that are placed at the band bottom (B), in the 
valence band (V), in the gap (G) and in the conduction band 
(C). See Fig. |HIa) for the actual positions of these energy 
points. 

As an example, we calculated the electronic structure 
in a bulk Si system of 512 atoms with a cubic simulation 
cell. We use a transferable Hamiltonian of silicon in the 
Slater-Koster form of s and p orbitals [2l|. The matrix 
dimension of the Hamiltonian H is M = 4x512 = 2048 
and the number of energy points is N cnc = 1000. The 
imaginary part of the energy (z = E + ij) is set to 
7 = 0.002au (=0.0544eV). Figure □ shows the decay be- 
havior of the a-RN. Here and hereafter, the starting basis 
|j) for silicon systems is set to an sp 3 -hybridized basis on 
an atom. We plot four cases with different reference en- 
ergy points. In result, all the cases follow an universal 
curve till the machine accuracy and the choice of the 
reference energy point does not affect the calculated a- 
RN. The behavior within a small iteration (n < 70) is 
also plotted in Fig. We should note that the observed 
decay at the early stage (n < 30) is important from prac- 
tical viewpoint, since the iteration number of n ~ 30 is 
enough for the application in Sec. IIV Al 



The convergence behavior was studied also in a bulk 
fee Cu of 1568 atoms, a metallic system. The simulation 
cell is a 7 x 7 x 8 supercell of the cubic unit cell. We con- 
structed the Hamiltonian matrix from the second-order 
form (i?*- 2 )) of the tight-binding linear muffin-tin orbital 
theory, In Fig. the a-RN is plotted with the start- 
ing bases of atomic s and t2 g orbitals. Since the present 
method is a general linear-algebraic theory with a short- 
range Hamiltonian (matrix), the convergence behavior 
shows no generic difference, between semiconductor and 
metal or between different starting orbitals. In fact, the 
curves in Fig. [21 behave similarly in magnitude with a 
large iteration number, for example n > 60, though dif- 
ferently with a smaller iteration number. 
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FIG. 2: Early stage in the decay behavior of the a-RN for Si 
crystal with 512 atoms and fee Cu with 1568 atoms. In the 
Cu system, the two cases are plotted with the starting vectors 
\j) of s and t2 S bases on an atom. 



B. Computational cost within one iteration 

Numbers of operations within one iteration are esti- 
mated in TablcQ] Two points are found for the drastic re- 
duction of computational cost, when the present method 
is compared to the conventional COCG method. For 
comparison, the case of the conventional COCG method 
is shown in the column labeled 'COCG', in which the 
conventional COCG methods is applied, independently, 
to all the systems (A^ ene systems) and the matrix-vector 
product governs the computational cost. In the shifted 
COCG method, on the other hand, the computational 
cost of the matrix- vector product is reduced by 1/N ene 
(MMi nt N cnc =>• MMint), since the actual matrix-vector 
product is carried out only for the reference system, as 
discussed in Sec. Ill Bl Then the scalar- vector products 
may give a significant contribution to the computation, 
if all the elements (M elements) of the vectors x^, 
and are calculated for all the systems, which is shown 
in the column labeled 'sCOCG (total)'. In the present 
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calculation, however, we need the elements of these vec- 
tors only within the interaction range (M- mt elements), 
as discussed in Sec. MI Al Since the vectors x°, Pni 
and in the shifted systems (N cno — 1 systems) are 
updated using eqs. (TBJ, l(TB|) . (TSJ, the update proce- 
dure can be carried out only for the necessary elements 
(Mi n t elements) . The calculation only for these elements 
gives another drastic reduction of the computation cost 
(3M(iV on o - 1) => 3M int (7V ono - 1)), which is shown in 
the column labeled 'sCOCG (present)'. 



IV. APPLICATIONS TO ELECTRONIC 
STRUCTURE CALCULATION 

A. Reconstruction on Si(OOl) surface 

The MD simulation with the shifted COCG method 
was tested in Si(OOl) surface reconstruction. The calcu- 
lated system is a slab of 1024 atoms constituted of 16 
layers with 64 atoms on each layer. The temperature 
parameter of the Fermi distribution function in eq. Q 
is set to T=0.005au(=0.136eV). We use the Hamiltonian 
and energy function in Ref. [21|. Other methodologi- 
cal details arc the same as in Sec. 1111 Al The atomic 
structure is relaxed, with the Hcllmann-Feynman force 
on atoms, from an appropriate surface atomic configura- 
tion into the ground-state structure. Since the force on 
atoms is given after the calculation of the density matrix, 
the simulation was carried out by a double-loop itera- 
tive procedure; The inner loop is the iterative procedure 
of the shifted COCG method for calculating the den- 
sity matrix with a given atomic configuration or a given 
Hamiltonian. The outer loop is the update of the atomic 
configuration, with force on atoms, to as to minimize the 
energy. When the KS dimension, or the iteration number 
of the inner loop, is n = 30 or larger, the resultant surface 



Method 


Inner 
Product 


Scalar- Vector 
Product 


Matrix- 
Vector 
Product 


COCG 


3MJV ene 


SMN en e 


MMmtNene 


sCOCG (total) 


3M 


3MA cnc 


MM int 


sCOCG (present) 


3M 


3M int (iV cn e-l) + 
3M 


MM int 



TABLE I: Numbers of operations within one iteration; (i) 
Inner product in eqs. 1111 and 1121 1. (ii) Scalar- vector prod- 
uct in eqs. JSJ, @, E3J an d eqs. ftn) . lfTo]l . <(T£jl and (iii) 
Matrix- vector product in Ap n of eq. lilt . The parameters 
in the table are as follows; M: the dimension of the origi- 
nal Hamiltonian matrix, Mint: the number of orbitals within 
interaction range for one orbital, iVene: the number of en- 
ergy points. Here, the cases of the three methods are plotted; 
(l)the conventional COCG method, labeled 'COCG, (2)the 
shifted COCG method with the calculation of all the elements 
of the Green's function and the RN, labeled 'sCOCG (total)' 
and (3)the actual procedure in the present calculation, labeled 
'sCOCG (present)'. See the text for details. 
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FIG. 3: Result of a slab for Si(001) surface with asymmetric 
surface dimers; (a) Local density of states . —(l/w)ImGjj) 
for surface and bulk atoms, calculated with the KS dimension 
of n = 30. The surface atoms are classified into the upper and 
lower atoms of the asymmetric dimer, illustrated in the inset. 
The 'bulk' atom is an atom on deeper layers of the present 
slab system, (b) Decay behavior of a-RN for the atoms that 
appear in (a). The horizontal dotted line is an eyeguide for a 
typical convergence criteria. 



atoms form asymmetric dimers illustrated in the inset of 
Fig. Eta), as should do. (Hl^] The resultant tilt angle 
of the asymmetric dimers is 8 = 13.4°, which a gree s with 
experimental values of 9, between 5° and 19°. [2f| 

Figure Et a ) shows the imaginary part of the Green's 
function — (l/7r)ImGj-j summed up over the orbitals 
within a specific atom, which corresponds to the local 
density of states (1DOS). As a general property of the KS 
method, the number of peaks in — (l/7r)ImGj-j equals the 
iteration number or the KS dimension. When the 1DOS 
of the surface and bulk atoms are compared, the 1DOS 
of the surface atoms have characteristic peaks within 
— leV < E < +0.5eV, because the upper surface atom 
has an occupied surface state and the lower one has an un- 
occupied surface state. The reproduction of these surface 
states is the reason why the correct surface reconstruc- 
tion is reproduced, even with a small number of the KS 
dimension (n = 30). 
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fined as follows; |2£ 



-4 

Energy [eV] 

FIG. 4: Result of fee Cu with 1568 atoms; Partial density of 
states ImGjj for each orbital. 



Figure EHb) shows the a-RN for these atoms as the 
function of the iteration number or the KS dimension. 
Here the a-RN for an atom is defined as the average of 
the a-RN, eq. (|27|l . among the orbitals within the atom. 
In result, the a-RN for the surface atoms decays simi- 
larly, while that for the bulk atom faster. Considering 
the fact that the required number of iterations is n=30 
to obtain appropriate surface reconstruction, a practical 
convergence criteria is estimated as the horizontal dotted 
line in Fig.^b) on the order of R^/R^ ~ 1CT 3 . If this 
convergence criteria is used, the optimal number of iter- 
ations is approximately 18 for the bulk atom, less than 
that for the surface atoms (n = 30). In other words, 
the optimal iteration number is determined for micro- 
scopic freedoms or independently among atoms or bases 
Moreover, the microscopic control can be carried 
out dynamically, or at every step in MD simulations. In 
short, the observation of the a-RN gives a definite way 
of controlling the accuracy microscopically and dynami- 
cally, which is important among practical investigations. 



Cu-AE) = --Y i ImG Ia , J0 (E + ij)H Wa ,(29) 

IT z — ' 

Cu(E) = Y^CuAE), 



(30) 



where / and J denote the atomic positions and a and 
(3 orbitals. The quantity Cu is the COHP and we call 
the quantity C /J:Q partial COHP (PCOHP). The energy 
integration of the COHP (ICOHP) has the dimension of 
energy. The off-site term of ICOHP gives a quantitative 
discussion of the cohesive mechanism, because its nega- 
tive and positive parts are the energy gain and loss in the 
electronic structure energy for cohesion, respectively. 
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B. Metal system: fee Cu 



FIG. 5: Result of fee Cu with 1568 atoms; COHP and ICOHP 
for a first nearest-neighbor atom pair (a) and PCOHP for a 
first (b) or second (c) nearest-neighbor atom pair. 



We calculated also the electronic structure of a bulk 
fee Cu of 1568 atoms. The technical details are already 
explained in Sec. IIII Al The well-converged partial DOS 
is shown in Fig. 0] for s, p, e g and t2 9 orbitals. The 
result reproduces the essential characteristics, e.g. the 
resonance behavior of s and p orbitals and the energy 
separation between e g and t2 9 orbitals. 

Another important property for analyzing cohesion is 
the crystal orbital Hamiltonian populations (COHP) dc- 



Figure|SJa) shows the COHP and ICOHP for a nearest- 
neighbor atom pair. The PCOHP Cu- a (E) is also shown 
in Fig. |3Jb) only for the orbitals (a) with major contri- 
bution. Since a nearest-neighbor pair lies along (011) 
direction in fee, the significant values in the PCOHP 
come from a=s, (p y ± p z )/V2 and d yz orbitals. Fig- 
ures 13 a) (b) show that the two characteristic peaks of 
the COHP at E ~ —5 eV and —3 eV arc contributed 
mainly by the PCOHP of the d yz orbital. The negative 
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and positive peaks originate from the bonding and anti- 
bonding coupling, respectively, of the t 2g orbitals among 
the nearest-neighbor atom sites. Though the two cor- 
responding peaks can be seen also in the PDOS of the 
t 2g orbital in Fig.H the COHP, unlike PDOS, informs us 
the bonding or anti-bonding character of the correspond- 
ing state. From Fig. EJb), we found that contributions 
from s and p orbitals are also appreciable. Moreover, 
the PCOHP for a second nearest-neighbor pair is plot- 
ted in Fig. E[c). Since a second nearest-neighbor pair 
lies along (001) direction, the significant values in the 
PCOHP come from s, p z and d^, z 2_ r 2 orbitals, though 
their magnitude is one order smaller than those for the 
first neighbor pair. 

The present analysis demonstrates that, since the 
shifted COCG method can give the Green's function with 
the machine accuracy, the resultant spectra reproduce 
the correct cohesive mechanism, in which the role of each 
orbital is well described not only for the major contribu- 
tion from the first nearest-neighbor coupling but also for 
a minor contribution from the second nearest-neighbor 
coupling. 



V. CONCLUSIVE DISCUSSION 

In the present paper, we introduced the shifted COCG 
method based on the Krylov subspace and used the 
method as an iterative solver algorithm of the Green's 
function in large electron systems. We analyzed the con- 
vergence behavior by means of the a-RN, which estab- 
lishes a definite way of controlling accuracy. The theory 
realizes a practical method not only for MD simulations 
but also for obtaining the fine-resolution spectra, such 
as (P)DOS and COHP, without calculating eigen states. 
The method was applied to semiconductor and metal and 
the above statements were confirmed numerically. 

When the present method is compared with the SD- 
KS method, we conclude that the two KS methods are 
complementary in the practical viewpoint and we should 
choose one, according to the purpose; If one would like to 
obtain the Green's function in a very fine energy resolu- 
tion, the shifted COCG method should be used, because 
it can reach the machine accuracy. On the other hand, 
as is discussed in Appendix IA 21 the SD-KS method is 
suitable for obtaining the density matrix without numer- 
ical energy integration of the Green's function, which is 
a typical situation in MD simulations. 

Finally, we point out the generality of the present the- 
ory. Since the shifted COCG method is based on a gen- 
eral linear-algebraic theory with large matrices, it is ap- 
plicable not only to electronic structure calculation with 
atomic orbital bases but also to calculation with other 
bases. Moreover, the method may be useful in many the- 
oretical fields other than electronic structure theory, if a 
theory is reduced to a set of shifted linear equations. 
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FIG. 6: (a) Well-converged Green's function and (b) con- 
vergence behavior at different energy points. The calcula- 
tion was carried out with Si 512 atoms and the top of the 
valence band locates at E = eV; (a) Imaginary part of 

the well-converged Green's function ( lxaGjj(E + 1^)) with 

7 = 0.002au. (b) The iteration numbers to fulfill the converge 
criterion ||ri J 'j| 2 < 10 £ with e = -4 to -16. Note that four 
energy points are picked out, at the band bottom (B), in the 
valence band (V), in the gap (G) and in the conduction band 
(C), for discussion (see text). 



APPENDIX A: NUMERICAL ASPECTS IN 
KRYLOV-SUBSPACE METHODS 



This appendix is devoted to several aspects of the two 
KS methods (i) the shifted COCG method, the main sub- 
ject of this paper, and (ii) the SD-KS method Par- 
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number of iterations (n) 



FIG. 7: Iteration dependence of (a) the residual norm (RN) 
in the present method and (b) the 'total' residual norm (t- 
RN). The symbols 'B', 'V, 'G' and ! C' indicate the energy 
points that are shown by arrows in Fig.[S](a). 



ticularly, numerical aspects will be discussed, including 
robustness against round-off error. Examples are demon- 
strated with silicon crystals, as in Sec. IIIII 



1. Shifted COCG method 

For the shifted COCG method, we discuss (I) the con- 
vergence behavior among different energy points and (II) 
the convergence behavior of long-distance component of 
the Green's function. These discussions clarify how and 
why the present method is so successful. 

First, we used the conventional COCG method at all 
the energy points (N ene — 1000 points), so as to inves- 
tigate the convergence behavior among different energy 
points. Figure Efa) shows the imaginary part of the well- 
converged Green's function, corresponding to the density 
of states (DOS). The spectrum consists of a set of spikes 
with a finite width, owing to a small imaginary part of 
energy (7 = 0.002 au=0.0544cV). The required itera- 
tion numbers with various convergence criteria are seen 
in Fig.|SJb), in which the convergence criterion is set to 



be \\r%'\\ 2 < 10 e with e = -4 to -16. The resultant 
DOS profiles among these criteria are indistinguishable 
from that of Fig. HJa) ■ Figures H>f a) and (b) indicate 
that an energy point with a larger value of DOS requires 
a larger number of iterations, because the KS dimension 
should be larger in order to distinguish individuals among 
densely distributed nearby states. 

In Fig. 0(a), the decay behavior of the RN is plotted 
for the four chosen energy points that were already dis- 
cussed in Fig.0and Fig.El(a). When Fig. 0(a) and Fig.0 
are compared, one can see that the decay behavior of the 
RN are quite different among the four energy points but 
the a-RN is universal, as should be from cq. (jT%|l . For 
example, we pick out the case in which the reference en- 
ergy is chosen as the point labeled by (B). In the case, the 
RN I |ri"^ 1 1 2 and the shift coefficient n° go to the extreme 
orders of 10~ 250 and 10 +250 , respectively, at n = 300. 
Even in such an extreme case, the computational proce- 
dure works well and the a-RN follows the universal curve, 
as in Fig. 

Second, we discuss the convergence behavior of the 
Green's function including its long-distance component, 
unlike in Sec. IIIII For monitoring the convergence be- 
havior, we should define the 'total' residual norm (t-RN), 
instead of the RN in eq. as 

M 

l|r«|| 2 = EK^)! 2 ' (Al) 

i 

where the elements are summed up among all the bases 
(M bases). In other words, the t-RN shows the conver- 
gence behavior for all the elements of the Green's func- 
tion (dj or G(r,r')), including its long-distance compo- 
nents. We should recall that the RN in eq. (|25[) is defined 
only for the short-distance components. The results are 
shown in Fig. 0(b) , in which the decay behavior is slower 
than that in Fig. 0(a) at an earlier stage (n < 30), though 
the behavior is the same at later stages. The difference at 
the earlier stage appears, because the accurate descrip- 
tion of the Green's function at further distances needs a 
larger number of KS bases or a larger iteration number. 
Moreover, the computational cost is enormous to calcu- 
late all the elements of the Green's function, as shown 
in the 'sCOCG(total)' case of Table HJ The present dis- 
cussion clarifies that only the short-distance components 
of the Green's function are required and calculated in 
practical applications. 

2. Subspace diagonalization method 

Here the subspace diagonalization method based on 
the KS (SD-KS method) [Hi is discussed for the compar- 
ison with the shifted COCG method. Though the two 
methods, commonly, give the density matrix or Green's 
function within the KS, the difference between them 
comes from the computational cost and the effect of nu- 
merical round-off error. 
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The practical procedure of the SD-KS method is sum- 
marized as follows; an orthogonal basis set for the KS 
is constructed by the Lanczos process, a three-term re- 
currence relation; {\K[ j) ) = \j), \K^ ] ), ...\I<1 3) )}. EH 
Here the number of bases v is the dimension of the 
KS, K„{H, \j)). This process creates simultaneously 
the reduced Hamiltonian matrix H K ^> within the KS 
{{H K ^) nm ee {K^\H\K^)), as a small tridiagonal ma- 
trix. Then the reduced matrix H K ^> is diagonalized 

(i) 

and we obtain the eigen values ed and the coefficients 

Can {a = 1,2, ....v), where the eigen vectors are given as 

\ w ^) = E£=i C a 3 l\K n 3) ). The Green's function is given 
as 



(i\G\j)^^{i\K^){K^\G^\j) (A2) 



with the definition of 



E 



(A3) 



The density matrix can be given in a similar manner. [Tll | 
The calculated band structure energy shows a rapid con- 
vergence as the function of the number of the KS bases 
v, both in semiconductor ^lj and metal (fee Cu, un- 
published). In both cases, the result is well converged, 
typically, with v = 30 . We have simulated the recon- 
struction on Si(001) surface. [Tl| as in Sec. II V Al and the 
resultant atomic position or 1DOS agrees with the ones 
obtained by the shifted COCG method. 

Several differences in numerical treatment are found 
between the SD-KS method and the shifted COCG 
method. The SD-KS method gives the Green's function 
analytically in cq. (|A3|) and its energy integration for the 
density matrix, eq can be given also analytically, 
while the shifted COCG method gives the Green's func- 
tion numerically on a given set of energy points and its 
energy integration should be carried out with a careful 
observation of the numerical error. Moreover, the com- 
putational cost of the SD-KS method is smaller than, 
typically a half of, that of the shifted COCG method, 
because the SD-KS method requires only real vectors, 
such as \K n i) ),\wi ) ), while the shifted COCG method 
requires several complex vectors, such as \xn ), |i"n )• 

Hereafter we discuss a crucial difference of the SD- 
KS method from the shifted COCG method; the SD- 
KS method shows a numerical instability with a very 
large number of KS bases (v), owing to the accumulation 
of round-off error. The above instability is analyzed by 
introducing the RN with eq. i|23[l . An clement of the RN 



is given as 

(i|r<?>) 



= (i\I-(z-H)Gy\z)\j) 



m E 



(i\z - H\K&}C<£ 



a,n—l 



z — e. 



U) 



(A4) 



This quantity can be calculated with a negligible compu- 
tational cost, since all the quantities in eq. l)A4jl . that is 

{e^}, {Ci J 2}, {(i\K n j) )} and {{i\H\K&)}, are always 
calculated in the generating procedure of the Green's 
function G^\z). The a-RN R [ J ] can be defined in a 
similar way as in eq. (|27[1 . The a-RN was examined 
for Si crystal in different system sizes (512, 4096, and 
32768 atoms), and the results are shown in Fig. [5] Here 
a problematic situation appear in the cases of 512 and 
4096 atoms, because the a-RN begins to grow after an 
appropriate value of the KS dimension v. So as to an- 
alyze the growth of error, the RN with the case of 512 
atoms is shown in Fig. with its energy dependence. 
The spectrum consists of only spikes before the growth 
of error (y = 30, Fig.^a)), while a finite background ap- 
pears after the beginning of the growth of error (y = 100, 

Fig. EflM). 
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FIG. 8: Decay behavior of the a-RN in the SD-KS method 
for Si crystals with 512, 4096 and 32768 atoms. 

The growth of error occurs, because the loss of orthog- 
onality ((Kn \Km ) 7^ $nm) always happens in actual 
calculation after a long iteration of the Lanczos proce- 
dure, owing to the accumulation of round-off error. In 
such a case, the calculated eigen vectors \wa^) have a 
finite deviation |£a ') from the true eigen vectors; 



H K U)\ W W) 



if 



G0\ 



(A5) 
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FIG. 9: Energy dependence of the RN in the subspace di- 
agonalization method for Si crystal with 512 atoms. The KS 
dimension is (a) v = 30 and (b) v — 100. 



Using cas. (|A4|) and i|A5[) . wc obtain 

(i|r«) = (i\I-(z-H)GP(z)\j) 

A {i\z-H\w < i ) ){w ( i ) \j) 



m-J2 

OL 

(i\d) 



z - e 



U) 



(A6) 



where wc define SH K ^ = H — H K V) and use 



\d) = \j)-J^\w^)(w^\j) 

a=l 

= j^\K^)(K^\K^). 



(A7) 



n=1 



The last equality of eq. (|A7|) is given by 



(A8) 



and \K{ 3 ') = \j). The first and second terms in eq. (|A6(I 
correspond to the finite background and the spiles in 
Fig.|9jb), respectively. With a small number of bases, as 
in Fig. Efa), the orthogonality between the bases holds 
exactly and we obtain \d) = |£« ) = 0. Therefore, the 
energy-independent term, the first term of eq. (|A6|) does 
not appear. The spikes in Fig. |5Ja) appear, because the 
reduced Hamiltonian matrix in the KS is deviated from 
the original one (6H K< J) =H- H K ^ ^ 0). 
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